334 PART 6 Analyzing Survival Data
software estimates the coefficients of the predictor variables that make the pre-
dicted survival curves agree as much as possible with the observed survival times
of each participant.
How does PH regression determine these regression coefficients? The short
answer is, “You’ll be sorry you asked!” The longer answer is that, like all other
kinds of regression, PH regression is based on maximum likelihood estimation.
The software uses the data to build a long, complicated expression for the proba-
bility of one particular individual in the data dying at any point in time. This
expression involves that individual’s predictor values and the regression coeffi-
cients. Next, the software constructs a longer expression that includes the likeli-
hood of getting exactly the observed survival times for all the participants in the
data set. And if this isn’t already complicated enough, the expression has to deal
with the issue of censored data. At this point, the software seeks to find the values
of the regression coefficients that maximize this very long likelihood expression
(similar to the way maximum likelihood is described with logistic regression in
Chapter 18).
Hazard ratios
Hazard ratios (HRs) are the estimates of relative risk obtained from PH regression.
HRs in survival regression play a similar role that odds ratios play in logistic
regression. They’re also calculated the same way from regression output — by
exponentiating the regression coefficients:»
» In logistic regression: Odds ratio e
Coefflclent
Regression»
» In PH regression: Hazard ratio e
Coefflclent
Regression
Keep in mind that hazard is the chance of dying in any small period of time. For
each predictor variable in a PH regression model, a coefficient is produced that —
when exponentiated — equals the HR. The HR tells you how much the hazard rate
increases for the participants positive for the predictor compared to the compari-
son group when you increase the variable’s value by exactly 1.0 unit. Therefore, a
HR’s numerical value depends on the units in which the variable is expressed in
your data. And for categorical predictors, interpreting the HR depends on how you
code the categories.
For example, if a survival regression model in a study of emphysema patients
includes number of cigarettes smoked per day as a predictor of survival, and if the
HR for this variable comes out equal to 1.05, then a participant’s chances of dying
at any instant increase by a factor of 1.05 (5 percent) for every additional cigarette
smoked per day. A 5 percent increase may not seem like much, but it’s applied for
every additional cigarette per day. A person who smokes one pack (20 cigarettes)